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Conformational fluctuations are believed to play an important role in the process by which tran- 
scription factor proteins locate and bind their target site on the genome of a bacterium. Using a 
simple model, we show that the binding time can be minimized, under selective pressure, by ad- 
justing the spectrum of conformational states so that the fraction of time spent in more mobile 
conformations is matched with the target recognition rate. The associated optimal binding time 
is then within an order of magnitude of the limiting binding time imposed by thermodynamics, 
corresponding to an idealized protein with instant target recognition. Numerical estimates suggest 
that typical bacteria operate in this regime of optimized conformational fluctuations. 
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The ability of bacteria to respond within minutes to 
changes in their environment relies on genetic switches 
that are controlled by transcription factors. Transcrip- 
tion factors are proteins that following activation by 
an environmental change — are able to locate a specific 
region (the "operator sequence") along the bacterial 
genome and bind to it, thereby regulating the expres- 
sion of a gene (or group of genes) adjacent to that region 
[l[ . The number of copies of a transcription factor pro- 
tein associated with a specific gene varies, but typically it 
is in the range of 10 2 . Because bacterial genomes have a 
size in the range of 10 7 sites, a transcription factor must 
be able to "scan" the DNA for the target site at a rate of 
10 5 sites per second or faster in order for at least one of 
them to reach the target site within seconds. Note that 
following the search for the target site, the transcription 
factor still has to bind to the target site to regulate the 
expression of the gene. 

A series of classical papers on the search process 
0, 0, 0] culminated in the work of Berg, Winter and 
von Hippel (BWH) who showed — for the canonical 
case of the lac repressor protein of the bacterium E. coli 
— that the search process takes place not by straightfor- 
ward 3D diffusion to the target binding site but rather by 
a slide-jump combination of ID diffusional sliding along 
the DNA chain alternating with 3D diffusional jumps be- 
tween different DNA segments. By restricting part of 
the search to the ID "target space" , the binding rate is 
effectively enhanced with respect to a pure 3D search, 
while the 3D jumps reduce the repetitive visits to the 
same sites that characterize purely ID diffusive searches. 
This scenario is made possible by a modest, non-specific 
electrostatic affinity between the transcription factor and 
duplex DNA. BWH also provided evidence that, under 
physiological conditions, the search time has a minimum 
with respect to the strength of this non-specific affinity, 
which may be the result of evolutionary optimization un- 



der selective pressure. Subsequent structural studies [(| 
have shown that the DNA-binding domains of the lac re- 
pressor are subject to strong conformational fluctuations 
when the protein is in contact with non-operator DNA. If 
the binding domain is in contact with operator sequence 
DNA then the protein can undergo a large-scale confor- 
mational change to a stable structure with direct contacts 
between the amino-acid side chains and the DNA bases. 

It would seem obvious that the delay time between ac- 
tivation and binding of a transcription factor to the op- 
erator sequence ( "binding time" ) is minimized by max- 
imizing the ID diffusion constant D\. However, simply 
increasing the transport rate will impair the accuracy, or 
fidelity, with which the protein can distinguish a right 
from a wrong site. Specifically, if the binding of a tran- 
scription factor to the target site is characterized by a 
certain rate CI, then the protein is likely to overshoot the 
target site if the jump rate Di/a 2 between sites, with a 
the spacing between protein binding sites, is large com- 
pared to f2. Similar conflicts between process speed and 
process fidelity are familiar from DNA duplication and 
transcription where increased reaction rates increase the 
number of duplication and transcription errors. 

Slutsky and Mirny p} proposed that conformational 
fluctuations could ease the conflict between speed and fi- 
delity. If some conformations of the transcription factor 
are sensitive to the DNA sequence while others are char- 
acterized by rapid transport then the transcription factor 
might be able to scan the genome efficiently by rapidly 
flipping between the two types of conformations. The 
aim of this paper is to analyze how close this mechanism 
can approach limits of search efficiency imposed by fun- 
damental principles of thermodynamics. We will address 
this question by examining a simple model for the confor- 
mational fluctuations, similar to that of Ref. Q, where 
the transcription factor is allowed to adopt only two con- 
formations (+ and — ) when in contact with non-operator 
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FIG. 1: (Color online) Schematic representation of the model. 
A protein moving diffusively through the cell volume (a) is 
adsorbed on genomic DNA (b) where it adopts one of two 
conformations: + and — . In the + conformation it is loosely 
associated with the DNA and can move by one-dimensional 
diffusion along the DNA chain (b) while in the — conformation 
(c) it is tightly associated with the DNA and is immobile. 
After returning to the + state, it restarts the sliding motion. 
The protein also can desorb from the chain (d) and return 
to three-dimensional diffusive motion. Following a number of 
such cycles, the protein lands in the "antenna region" within 
a distance A of the target binding site (e). After reaching 
the target site by one- dimensional diffusion it can undergo a 
large-scale irreversible conformational transition to the final 
bound state if it is in the — state (f). 



DNA. As illustrated in Fig. [TJ in the + state, the pro- 
tein is less ordered and only loosely associated with the 
DNA while it can slide along the DNA chain. In the 
— state, the protein is more ordered, closely associated 
with the DNA and immobile [8( . If the transcription fac- 
tor is in contact with the target operator sequence then, 
in addition to these two states, it also can undergo an ir- 
reversible conformational transition from the — state to 
the fully ordered final bound state. We will show that the 
shortest possible binding time in this model is controlled 
by a dimensionless binding rate u> = 2Qab/ \J K D\D 3 , 
with D3 the protein diffusion coefficient in bulk solution, 
D\ the diffusion coefficient for ID transport along the 
DNA in the + state, K the equilibrium constant for the 
non-specific protein-DNA interaction, and b the DNA- 
protein "capture radius" .£| . If the dimensionless binding 
rate is comparable to one — or larger than one — , then we 
can show that for a particular value of the energy differ- 
ence AE± between the + and — conformations, the bind- 
ing time can approach an absolute lower bound that cor- 
responds to proteins having infinitely fast final binding 
rates. In other words, if the internal degrees of freedom 
of the protein in the sliding state are properly matched 
to the final binding rate then the binding time of a tran- 
scription factor can approach the shortest possible value 
allowed by thermodynamics provided the dimensionless 
binding rate is sufficiently large. 

To demonstrate these claims, assume a cell of volume V 
containing a DNA genome of length L. The cell also con- 
tains a certain (low) concentration c of transcription fac- 
tor proteins that can bind reversibly and non-specifically 
to the DNA. A protein whose center is located inside a 



cylindrical tube of radius b surrounding the duplex DNA 
will be assumed to be non-specifically associated with 
the DNA. The fraction (f> of the total cell volume occu- 
pied by the tube is of the order of Lb 2 /V. There is also 
a single target site on the strand where the transcription 
factor can bind irreversibly. We start by applying a fun- 
damental theorem [Io| , which — in terms of our model — 
states that the mean waiting time for irreversible occu- 
pation of the target site, the quantity of interest to us, 
is equal to the inverse of a steady-state diffusion current 
of a different problem, namely one where the target site 
is replaced by a protein sink that constantly absorbs — 
state transcription factors located at the target site at 
a rate CI, while the protein concentration far from the 
target site is maintained at a certain fixed value 03(00). 
The steady-state diffusion current, denoted by J3T), into 
the target site for this second problem can be obtained 
from straightforward solution of the diffusion equation, 
which leads to the well-known Smoluchowski relation for 
the reaction rate of diffusion-limited chemical reactions: 

J 3 d ~ D 3 c 3 (oo)Z . (1) 

Following Ref. [ll[ , the effective "target radius" is de- 
fined as the radius of a sphere, surrounding the target 
site, that determines a cross-over regime such that far 
outside the sphere adsorption of proteins onto the DNA 
chain is in equilibrium with evaporation of protein from 
the DNA chain while deep inside the sphere the absorp- 
tion rate exceeds the evaporation rate. For the case of 
transcription factors obeying BWH slide-skip transport, 
the size of this target sphere is determined by the condi- 
tion that if a protein lands on a DNA segment inside the 
target sphere, following a 3D diffusion step, then it typi- 
cally reaches the target sink by pure ID diffusion where 
it gets absorbed before there is a chance for it to "evap- 
orate" and leave the DNA. The length A of DNA chain 
inside this target sphere — referred to as the "antenna" 
length — in general depends on the spatial organization 
of the genome. We will assume here the simple case of 
a straight genome, with £ of order A [12]. This antenna 
length has to be determined self-consistently but first we 
must establish a relation between 03(00) and the actual 
protein concentration c. 

Far outside the target sphere the DNA-protein system 
is, by assumption, nearly in local thermal equilibrium, 
so one can determine the concentrations of adsorbed and 
free proteins purely from equilibrium considerations. If 
one views the association of the transcription factors with 
DNA as a simple chemical reaction, then the concentra- 
tion c(oo) of proteins adsorbed non-specifically on the 
DNA and the concentration c 3 (oo) of free proteins must 
be related to the reaction volume fraction <p by the Law 
of Mass Action for dilute chemical systems in thermody- 
namic equilibrium: 
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with <fi <C 1. The non-specific protein-DNA equilibrium 
constant K depends strongly on the salt concentration 
, and other thermodynamic parameters, but it is inde- 
pendent of the protein and DNA concentrations. Since 
c = 03(00) + c(oo), the concentrations of free and ad- 
sorbed proteins are now determined but it will be useful 
to replace the bulk concentration c(oo) of adsorbed pro- 
teins by the ID concentration ci(oo) ~ b 2 c(oo)/4>, the 
number of adsorbed proteins per unit length of DNA far 
from the target site. Solving for ci(oo) and 03(00) gives 
ci(oo) ~ cb 2 /K(l + (t>/K) and c 3 (oo) ~ c/(l + <f>/K), still 
for (j> < 1. 

Deep inside the target sphere, the system is not in 
thermal equilibrium, with the adsorption rate of proteins 
from the bulk solution to the DNA exceeding the evap- 
oration rate. The difference is matched by a ID diffu- 
sion current Jid along the DNA chain towards the tar- 
get site. In order to estimate this ID diffusional trans- 
port, note that if the interconversion rate between the 
+ and — states is sufficiently rapid then their respec- 
tive occupancies can be approximated by the equilib- 
rium Boltzmann distribution. The effective ID diffu- 
sion constant for transport along the chain — which we 
will denote by D\ — is then proportional to the Boltz- 
mann probability to find the protein in the + state. 
If fx = exp(-AE±/k B T), then p(+) = + fi) and 
D\ ~ + ff). Similarly, the effective target site 

binding rate O is, under these same conditions, propor- 
tional to the probability p(— ) = 1 — p(+) to find the 
protein in the — state and ft ~ 0/(1 + /-*)• 

Let ci(0) be the ID concentration at the target site. 
If the final binding rate were infinitely fast, then ci(0) 
would be zero but, because of the overshoot effect, this 
is no longer the case. If we view the surface of the tar- 
get sphere as a matching region between the asymptotic 
regions far from the sink where the ID concentration 
approaches Cx(oo) and the region deep inside the tar- 
get sphere near the sink where the ID concentration ap- 
proaches ci(0), then we can estimate the ID concentra- 
tion gradient as [ci(oo) — ci(0)]/A. It follows that the 
ID diffusion current towards the sink equals: 



Jip~A cl(oo) r cl(0) 

A 



(3) 



The number of proteins absorbed per second by the 
sink itself, J s , is of the order of aci(0)O, with a the spac- 
ing between protein binding sites. Conservation of the 
number of proteins requires the three currents J 3 d, J id 
and J s to be equal to each other ll|, so 



J3D — Jib — Js 



(4) 



Equating the ID diffusion current with the sink current 
allows us to eliminate Ci(0) with the result: 
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FIG. 2: (Color online) Contour plots of the transport en- 
hancement factor A as a function of the equilibrium constant 
K and the occupation probability ratio /j, of the + over — 
states for £>i = 1CT 9 cm 2 /s, D s = 3 x 1CT 7 cm 2 /s, a = 0.34 
nm, b = 5 nm, 4> = 0.01 and f2 = 3 x 10 3 Hz. There is a shal- 
low maximum around fi = 0.1 and K — 10 3 . The ratio of the 
transport enhancement factor at this maximum, A>pt, and 
the thermodynamic limiting enhancement factor Aoo equals 
0.193. Inset: Dependence of the ratio Aopt/Aoo on the dimen- 
sionless binding rate w. 



The factor in front of the square brackets is the diffusion 
current in the absence of overshoot. The importance of 
overshoot is thus determined by the dimensionless num- 
ber aXfl/Di. Since X 2 /Di is the typical time spent by a 
protein diffusing along the antenna, it follows that aX/ D\ 
is the typical time spent near the target site so aAO / D\ 
is the product of the typical time spent near the target 
site with the effective absorption rate. The term inside 
the square brackets can then be understood as the prob- 
ability for a protein in the antenna region to be trapped 
by the target. 

Equating the ID and 3D currents provides us with a 
self-consistency condition that determines both the size 
of the antenna length A and the reaction rate. Solving for 
A using Eqs. |T| and (g]) and using ci(oo)/c 3 (oo) ~ b 2 /K 
gives the antenna length: 



\ 
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(6) 



O + Di/aX 



(5) 



The maximum value, Aoo = \Jb 2 D\l KD3, is reached for 
infinite Q and infinite fi. 

It will be helpful to express the binding rate J3D ~ 
-D3C3(oo)£ in dimensionless units as A = J^D/^cD^a) 
with cD^a the Smoluchowski limiting rate of a conven- 
tional 3D diffusive search for an absorber target of radius 
a (the spacing between binding sites) , so A can be viewed 
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as a reaction amplification or enhancement factor. This 
enhancement factor can be expressed as a simple function 
of the dimensionless binding rate lo = 2ilab/y / KDiD 3 
and the Boltzmann factor fi = exp (— AE±/kBT): 



A{u^)^A 00 -[ x ^-^—+^~n). (7) 



Here Aoo = (b/a)^KD 1 /D^/(K + 4>) is the maximum 
value of the enhancement factor, corresponding to A = 
Aqo with both /i and Q infinite. We will examine the am- 
plification factor A(lo, fi) as a function of the non-specific 
equilibrium constant K and the occupation ratio fi of 
the + state and — state, rather than lo and \x, because 
these are physical parameters characterizing the interac- 
tion between the transcription factor and the DNA that 
are expected to be sensitive to specific point mutations 
of the transcription factor amino-acid sequence through 
their exponential dependence on binding and activation 
energies. The contour lines of constant A as a function 
of K and /i in Fig. [2] show that there is a single, rather 
shallow maximum. The physical origin behind the max- 
imum of A with respect to K is, as discussed earlier, 
the fact that a combination of ID and 3D diffusion min- 
imizes the search time. By contrast, the maximum of A 
as a function of /i at /z op t = (vl + 2cJ— l)/2 is surprising 
because it might have been expected that for sufficiently 
long DNA, location of the target site always should be 
the "rate-limiting" step, in which case the optimal choice 
for /i would be infinite since that maximizes the effective 
ID diffusion constant D\ — Di/j,/ It can be shown 
that the maximum with respect to \i actually is a form 
of impedance matching with the effective "resistance" of 
the ID diffusional search matched with the effective re- 
sistance of the binding process. 

If [x adopts the optimal value /i pt = (vT+2w — l)/2, 
then the ratio Aipt/^oo of the optimal rate amplification 
factor and its maximum value is a function only of the 
dimensionless rate lo: 

;%iM = i_I(vT+2^-i) . (8) 

Aoo v 

The dependence of A^/A^ on lo is shown in the inset 
of Fig. [5] A opt is of the same order of magnitude as the 
theoretical limit A^ already for modest values of lo. This 
demonstrates our central claim: it is possible for the over- 
all binding rate of a transcription factor to approach the 
theoretical limiting value but only by a suitable choice of 
\x, and only if the dimensionless binding rate lo is of the 
order of one, or larger than one. 

Are these two conditions realistic for typical transcrip- 
tion factors? Typical values for the diffusion constants 
of bacterial transcription factors are [HI, 13] D j ~ ICP 9 
cm 2 /s and D$ ~ 3 x 10~ 7 cm 2 /s. We can estimate the 
protein-DNA reaction volume fraction (j) for E. coli by as- 
suming it to be comparable to the DNA volume fraction 



(about 1%). The equilibrium constant can then be de- 
termined from the relation 03(00) ~ c/(l + 4>/K) and the 
fact that it is known that about 10% of the lac repressor 
proteins of E. coli are in solution [l5| , which means that 
K must be of the order of 10~ 3 . If we assume a to be 
equal to the base-pair spacing 0.34 nm, and estimate b 
as 5 nm, then the dimensionless binding rate lo is of the 
order of 10 _4 f2 with the binding rate fi expressed in Hz. 
A large-scale protein conformational change typically in- 
volves millisecond to microsecond time scales, from which 
it follows that lo must lie in the range of 0.1 to 100. Note, 
from Fig. [2] that the optimal value for K is close to 10~ 3 
for fi in the kHz range. We conclude that the second con- 
dition can be satisfied under typical conditions. Next, the 
optimal occupation ratio /x opt = (VI + 2w — 1)/2 is in the 
rage of 0.1 to 10 for lo in the range of 0.1 to 100. The cor- 
responding optimal energy difference AE± between the 
+ and — states is then in the range of a few fcgT, with 
AE± positive for lo < 4 but negative for lo > 4. In cither 
case, the structure of "optimized" transcription factors 
bound to non-operator DNA should be subject to strong 
thermal fluctuations. As we saw, this is indeed the case 
of the lac repressor [6] , while a recent modeling study of 
the Ets-DNA system arrives at the same conclusion 
The first condition can thus be satisfied as well under rea- 
sonable conditions. Finally, the measured lac repressor 
binding rates [B[ are comparable to the thermodynamic 
limiting rate. We conclude that, under reasonable condi- 
tions, the binding rate of transcription factor proteins can 
be of the same order of magnitude as the thermodynamic 
limiting rate if the energy spectrum of conformational 
fluctuations is determined, under selective pressure, by 
minimization of the overall binding time. 
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